Linear Regularizers Enforce the Strict Saddle Property

نویسندگان

چکیده

Satisfaction of the strict saddle property has become a standard assumption in non-convex optimization, and it ensures that many first-order optimization algorithms will almost always escape points. However, functions exist machine learning do not satisfy this property, such as loss function neural network with at least two hidden layers. First-order methods gradient descent may converge to non-strict points functions, there currently any reliably To address need, we demonstrate regularizing linear term enforces provide justification for only locally, i.e., when norm falls below certain threshold. We analyze bifurcations result from form regularization, then selection rule regularizers depends on an objective function. This is shown guarantee neighborhoods around broad class points, behavior demonstrated numerical examples common literature.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Strict Order Property and Generic Automorphisms

If T is an model complete theory with the strict order property, then the theory of the models of T with an automorphism has no model companion.

متن کامل

Fast Rates for Empirical Risk Minimization of Strict Saddle Problems

We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property. Recent progress in non-convex optimization has yielded efficient algorithms for minimizing such functions. Our results imply that these efficient algorithms are statistically stable and also generalize well. In particular, we derive ...

متن کامل

On the Lipschitz property of strict triangular norms

This paper deals with Lipschitz triangular norms (t-norms). A partial answer to an open problem of Alsina, Frank and Schweizer is given with regard to strict t-norms with smooth additive generators. A new notion of local Lipschitz property for arbitrary t-norms is introduced. Some remarkable examples of non-Lipschitz continuous ones are provided.

متن کامل

Efficient First Order Methods for Linear Composite Regularizers

A wide class of regularization problems in machine learning and statistics employ a regularization term which is obtained by composing a simple convex function ω with a linear transformation. This setting includes Group Lasso methods, the Fused Lasso and other total variation methods, multi-task learning methods and many more. In this paper, we present a general approach for computing the proxi...

متن کامل

Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU nonlinearity is the expected transformation of a stochastic regularizer which randomly applies the identity or zero map, combining the intuitions of dropout and zoneout while respecting neuron values. This connection suggests a new probabilistic understanding of nonlinearities. We pe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i8.26194